Automatic Selection, Recording and Meaningful Labeling of Clipped Tracks From Media Without an Advance Schedule

ABSTRACT

Automatic selection, recording and meaningful labeling of tracks from media streams is provided. Content information which relates to tracks currently being played and/or to previously played tracks is used to guide selection of tracks to be recorded and to provide meaningful labels for recorded tracks. This content information does not provide an advance schedule of tracks to be played in the future. A segment is temporarily recorded from a selected media stream. The content information relating to tracks in the segment is compared with previous user input (i.e., track preferences) to select tracks within the segment to be recorded. For each selected track, clipping is performed to identify track start and end times. Clipped tracks are recorded onto a suitable long-term recording medium, and labeled with a meaningful label derived from the content information. Content information can be derived by automatic analysis of the media stream.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.13/693,792, filed Dec. 4, 2012 and entitled “Automatic selection,recording and meaningful labeling of clipped tracks from broadcast mediawithout an advance schedule”.

Application Ser. No. 13/693,792 is a continuation of U.S. patentapplication Ser. No. 12/313,017, filed Nov. 13, 2008 and entitled“Automatic selection, recording and meaningful labeling of clippedtracks from broadcast media without an advance schedule”.

Application Ser. No. 12/313,017 is a continuation in part of U.S. patentapplication Ser. No. 10/946,330, filed Sep. 20, 2004, entitled“Automatic selection, recording and meaningful labeling of clippedtracks from broadcast media without an advance schedule”.

Application Ser. No. 10/946,330 is a continuation in part of U.S.application Ser. No. 10/824,727, filed Apr. 14, 2004, entitled“Automatic selection, recording and meaningful labeling of clippedtracks from broadcast media without an advance schedule”.

FIELD OF THE INVENTION

This invention relates to recording of media.

BACKGROUND

In recent years, high-quality broadcast media (e.g., digital radio anddigital television) and high-capacity, high-fidelity personal recordingcapability have become widely available. For example, a 100 GB magneticdisk drive can store high-fidelity recordings of roughly 15,000 to50,000 music tracks, depending on resolution, and high-quality broadcastof such music tracks is becoming increasingly commonplace. As a result,the principal difficulties to be overcome in generating a library ofrecorded broadcasts for personal use are issues associated withcreating, organizing and managing such a library. For example, if userinput is required for each track (for recording, labeling and/ororganizing), then generation of a large library of recorded tracks willbe excessively time-consuming.

Automated selection and recording of broadcast media has been consideredin the art, especially in connection with the TiVo® service offered byTiVo Inc. However, this service relies on advance schedule informationand/or on special tags inserted into broadcast media streams in order toperform automatic selection and recording. For example, the systemprovided by TiVo Inc. typically provides advance schedule information toa user, and the user is then able to select shows for recording based onthe advance schedule information. Such user selection can be manual(e.g., the user selects a particular show on a particular day forrecording or triggers a recording button for immediate recording). Theuser selection can also be automatic (e.g., the user selects aparticular type of show to be recorded, and the system automaticallyrecords all such shows found in the advance schedule when they arebroadcast).

However, advance schedule information may not always be available,especially in broadcast radio. For example, an advance schedule fortracks played during a live radio call-in show is inherently impossibleto provide. Moreover, some radio broadcasters are prohibited fromproviding an advance schedule of their programming content by current UScopyright law. Furthermore, a timing discrepancy between advanceschedule time and actual broadcast time is to be expected, and thisdiscrepancy can be as much as a minute or so in current systems. Such atiming error is typically not a serious issue when recording televisionshows which are usually at least half an hour long, and are typicallyseparated by lengthy commercial breaks. However, a timing error of thatmagnitude is unacceptable for recording music tracks which frequentlyhave a total duration on the order of a few minutes, and are oftenplayed without intervening commercials. An advance schedule suitable foruse in recording music in a hypothetical system similar to that of TiVoInc. may be required to have a timing error of about a second or evenless, which greatly increases the difficulty of providing such anadvance schedule-based service.

Accordingly, it would be an advance in the art to provide automatedselection and recording of broadcast media which does not requireadvance schedule information. It would be a further advance in the artto provide automated selection and recording of broadcast media thatcompensates for timing errors in content information used to makeselections. It would be a further advance in the art to automaticallyprovide meaningful labels for automatically recorded tracks.

SUMMARY

The present invention provides systems and methods for automaticselection, recording, and meaningful labeling of tracks from mediastreams. Content information which relates to tracks currently beingplayed and/or to previously played tracks is used to guide selection oftracks to be recorded and to provide meaningful labels for recordedtracks. This content information does not provide an advance schedule oftracks to be played in the future. A segment is intermediate recordedfrom a selected media stream. The content information relating to tracksin the segment is compared with previous user input (i.e., trackpreferences) to select tracks within the segment to be long-termrecorded. For each selected track, start and end times are determinedand the track is clipped accordingly to accurately define the track.Clipped tracks are final recorded onto a suitable long-term recordingmedium, and labeled with a meaningful label derived from the contentinformation. Optionally, the recorded tracks can be automaticallyorganized according to their respective meaningful labels. Contentinformation can be derived by automatic analysis of the media stream.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow diagram of an automated selection and recording methodaccording to an embodiment of the invention.

FIG. 2 shows relative timing of a stream 202, “currently playing”content information 204 for stream 202, and a buffered stream 206obtained by time-delaying stream 202.

FIGS. 3a and 3b show methods for batch and triggered recording,respectively, according to embodiments of the invention.

FIGS. 4a and 4b show methods for static and dynamic stream selection,respectively, according to embodiments of the invention.

DETAILED DESCRIPTION

FIG. 1 is a flow diagram of an automated selection and recording methodaccording to an embodiment of the invention. In the example of FIG. 1,it is assumed that several broadcast media streams are available forrecording, and that content information specifying the tracks currentlyplaying on each media stream is available. Other embodiments of theinvention, discussed in connection with FIG. 3 a, can make use ofcontent information relating to previously played tracks.

Throughout this description, content information can include informationsuch as track title and/or track artist and/or track album etc. Contentinformation can be made available either by a broadcaster or by a thirdparty. Furthermore, content information may relate to tracks that arecurrently playing on broadcast media streams, or can relate to tracksthat have been previously played on broadcast media streams. Contentinformation can also be augmented with user or third-party streamdescriptions. For example, a user or a third party may designate stationX as “70s rock” and this designation can be included in contentinformation for tracks recorded from station X. As used herein, contentinformation does not include schedule information on tracks to be playedin the future.

The first step of this method is to provide user preferences 102. Userpreferences 102 include track preferences, such as preferred artists,album titles and/or track titles. Such track preferences are used toautomatically select tracks for recording in accordance with theinvention. Optionally, the method can derive track preferences from userinput and/or past history. For example, a list can be maintained of alltracks that have been recorded to date, and such a list allows recordingof duplicates to be automatically avoided. In such cases, a user canselect whether or not to prevent duplicate recording.

User preferences 102 can optionally include stream preferences, such asa station to monitor, a fixed monitoring schedule including severalstations monitored at different times, and/or a monitoring schedulehaving at least two stations monitored simultaneously (which requires amulti-channel recording system). Alternatively, stream (or channel)selection can be based on track preferences and content information. Forexample, a user directive to “record anything by artist X broadcast onany channel” can be used to govern stream selection. Of course, such adirective is typically not absolute, and results will depend on streamcharacteristics and on the performance of the system used to implementthe method. For example, if two different songs by artist X aresimultaneously playing on two different streams, then a single-channelsystem will be able to record only one of the two songs. Streamselection can also be governed by user-supplied rules combined withhistorical data. For example, a user can supply a rule to de-selectchannels which have a high duplication rate of tracks already recorded.Conversely, a user can also supply a rule to select channels whoseprogramming has many non-duplicate tracks of interest to the user.

The second step of the method of FIG. 1 is optional buffering 104. Inthis context, buffering a media stream entails receiving the mediastream into a temporary storage device and outputting a buffered mediastream from the temporary storage device. The buffered media stream is afaithful replica of the input media stream, except for a fixed timedelay. Suitable temporary storage devices for providing such bufferingof a media stream are known, and are especially easy to provide fordigital media streams. In some cases, a stream may be selected afterbroadcast of a desired track has begun, and in such cases, buffering themedia streams is preferred to enable recording of the entire desiredtrack. This can be done by recording from the buffered media stream, andensuring the buffer delay is longer than the delay between the start ofthe desired track and completion of stream selection. In some cases, thecontent information may be included in the same physical signal as oneor more media streams, and in such cases, the incoming signal can beduplicated with one part being delayed by buffering and treated as themedia stream, and the other part not being delayed and being treated asthe content information. For the purposes of this description, buffering104 provides relatively short time delays, and is preferably implementedas a RAM cache within chip-based memory.

The next step of the method of FIG. 1 is stream selection 106, asdiscussed above. In this example, two streams 116 and 118 are available,and a desired track 120 is identified on stream 118 based on contentinformation and user track preferences. Thus stream 118 is selected.

The next step of the method of FIG. 1 is intermediate recording 108 of asegment of selected stream 118. Intermediate recording 108 can make useof any recording medium, such as RAM or a magnetic recording medium(e.g., a disk drive). In cases where recording 108 records to anon-volatile medium (e.g., magnetic or optical storage), the resultingrecorded segment is called “intermediate” in this description (eventhough it is a non-volatile recording) because further processing willbe performed on the recorded segment to arrive at the desired finaltrack recordings. Segment start and end times 122 are shown on FIG. 1.Buffering as discussed above can be used to ensure the segment starttime is before the start time of track 120. By monitoring the contentinformation of stream 118, the start time of the track following track120 on stream 118 (and thus the end time of track 120) is known. Segmentrecording preferably extends past this end time by a suitable timeinterval to ensure the segment includes all of track 120.

The next step of the method of FIG. 1 is clipping 110 of selected track120 within the recorded segment. Clipping 110 entails automaticallydetermining the start and end times of selected track 120. Such adetermination can be made by known methods. For example, intervals ofsilence can be located in a segment of a media stream by a digitalsignal processor (DSP) (implemented in hardware and/or in software) todetermine track start and end times. If background noise is present,then a DSP may use relative silence, rather than absolute silence as aguide to determine clipping points. If timing markers are available fromthe content information, these can be used to determine clipping points,either alone or in conjunction with the above DSP methods. Such timingmarkers can be provided as a real time stream relating to currentlyplaying tracks, or as a log of start and end time of previously playedtracks. Clipping 110 can be performed with a much greater degree ofprecision than can be expected from a system relying on advance scheduleinformation for track start and end times, and this improved precisionis a significant advantage of the invention.

The next step of the method of FIG. 1 is final recording 112 of clippedand selected track 120. Track 120 is recorded between the start and endtimes determined by clipping 110. Recording 112 entails generating ameaningful label 126 for the recorded track. Labels are referred to as“meaningful” only if they are derived from relevant content information.For example, a label “track00” is not a meaningful label of a recordingof Beethoven's 9th symphony, while a label “Beethoven Symphony 9” is ameaningful label for such a track. Such labels can be file names, orsuch labeling can be implemented in an associated database relating filenames to labels. For example, recorded tracks could have purelynumerical file names, and a database relating numerical filenames tomeaningful labels (e.g., artist, album title, song title, etc.) can beautomatically constructed, maintained and updated. Optionally, recording112 also includes organizing the recorded track according to its label(e.g., inserting the file into a directory tree 128). For example, ameaningfully labeled recording file can be inserted into a hierarchicaldirectory structure organized by genre, artist and album title inincreasing order of specificity. Final recording 112 can be to anyrecording medium, such as an optical recording medium, a magneticrecording medium, or a nonvolatile semiconductor memory medium.Preferably, final recording 112 is to a magnetic recording medium, suchas a conventional computer hard disk drive.

Since genre information is often assumed to be obvious given the natureof the broadcast stream, genre information may be derived from userprovided, third party provided or automatically generated genredescriptions for a given stream. This genre information can be combinedwith the content information to create a greater degree of labelingand/or organizing accuracy for each track. Such labeling andorganization is largely independent of the physical nature of therecording medium. Automatic labeling of recorded tracks with meaningfulcontent-based labels, as discussed above, is a significant advantage ofthe invention compared to automatic recording methods which only providemeaningless labels (e.g., numeric labels or date/time/station labels)that have no relevance to the track content. For example, an automaticrecording system without automatic meaningful labeling of tracks canconfront a user with a daunting and tedious task of manually labelinghundreds or even thousands of recorded tracks.

Finally, a decision 114 is made whether or not to continue monitoringand recording. If “yes”, the method flow returns to a point before step106. If “no”, the method flow terminates.

FIG. 2 shows relative timing of a stream 202, “currently playing”content information 204 for stream 202, and a buffered stream 206obtained by time-delaying stream 202. In stream 202, track start timesfor several consecutive tracks are indicated as 202 a-e. Times 204 a-eare the times when content information 204 is updated to account for theplaying of tracks beginning at times 202 a-e respectively. As shown onFIG. 2, content information may be available immediately (e.g., 202 cand 204 c), or it may only become available after a track has startedplaying (e.g., 202 a, b, d and 204 a, b, d). Furthermore, this delay mayvary from track to track as shown on FIG. 2. In some cases (e.g., 202 eand 204 e), content information may be available slightly before (e.g.,less than 1 s) the corresponding track starts. In the context of FIG. 2,it is assumed that content information is available for currentlyplaying tracks, possibly with a slight delay. Other embodiments of theinvention can make use of content information on previously playedtracks, and are discussed in connection with FIG. 3 a.

Since stream 206 is a buffered copy of stream 202, it is the same asstream 202 except for a time delay 208. Thus 206 a-e are delayed trackstart times corresponding to track start times 202 a-e respectively.Time delay 208 is preferably larger than a maximum delay 210 betweentrack start time and content information availability, since suchbuffering is sufficient to ensure recording an entire track in thepresence of a slight delay in content information availability. Inaddition, it may require a non-negligible time Ts to switch from onestream to another. In such cases, the buffer time delay 208 ispreferably greater than or equal to time delay 210 plus Ts, to enablerecording of an entire track in the presence of both time delays.

Embodiments of the invention can operate in various modes. For example,either batch or triggered recording can be performed. In triggeredrecording, the recording of a segment from a selected media stream isresponsive to an indication from content information that a track ofinterest is playing. In batch recording, a segment from a selected mediastream is recorded without reference to the content information, andthen content information for the recorded segment is used to determineif tracks of interest are present in the recorded segment. To clarifythe difference between these two modes, FIGS. 3a and 3b show methods forbatch and triggered recording, respectively, according to embodiments ofthe invention.

The first step in FIG. 3a (batch mode) is intermediate segment recording302. Segment recording 302 can be to a either a volatile or a nonvolatile physical medium. For example, intermediate segment recording302 can entail continuous recording of a steam for a long time (e.g.,several hours) onto a magnetic disk drive, where the recorded segment issubsequently processed to locate, clip, finally record and label tracksof interest.

In this example, and throughout this description, “intermediaterecording” and “final recording” are used in a broad sense. Inparticular, final recording can entail the transfer of information fromone location to another location (e.g., in cases where intermediaterecording is to a semiconductor memory, and final recording is to amagnetic disk drive). Final recording can also entail the rearrangementor relabeling of information already stored at one location. Forexample, if intermediate recording is to a magnetic disk drive, thenfinal recording can entail manipulation of data already stored on thedisk drive to transform recorded segments to recorded, clipped andlabeled tracks.

To obtain content information for the recorded segment, a past play listcan be automatically constructed 304 by monitoring “currently playing”content information during recording 302. Alternatively, a past playlist can be obtained 306 after completion of recording 302. For example,the play list for a radio show may be made available (e.g., on theinternet) by a broadcaster (or a third party) some time after completionof the show. Such a play list is suitable content information for arecorded segment including the radio show. Once content information forthe recorded segment is available, this content information is comparedwith user track preferences to select 308 which tracks, if any, in therecorded segment should be recorded. Selecting 308 is preferablyimplemented in conventional computer software to maximize flexibilityand capability and minimize cost. The selected tracks are then clipped310 and finally recorded 312 as discussed above.

The first step in FIG. 3b (triggered mode) is monitoring 314 of contentinformation. When a track of interest to the user is known to be playingbased on the content information, triggered recording 316 of a segmentof a stream including the track of interest occurs. As discussed inconnection with FIGS. 1 and 2, buffering of media stream inputs istypically required to ensure recording the entire track of interest.Within the recorded segment, the track of interest is selected, based oncontent information and user track preferences, and then the selectedtrack is clipped 318 and final recorded 312 as discussed above.

The batch mode of FIG. 3a is simpler than the triggered mode of FIG. 3b, mainly because buffering of media stream inputs typically is notrequired for batch mode recording and typically is required fortriggered mode recording. However, triggered mode recording providesmore flexibility to the user, and is especially advantageous forcatching and recording tracks which are rarely broadcast. Thus either ofthese two modes may be preferred, depending on circumstances. In amulti-channel system, these two modes could be practiced simultaneously,where some channels of the system operate in batch mode and otherchannels operate in triggered mode.

In addition to batch and triggered recording modes, the invention can bepracticed with either static stream selection or dynamic streamselection. In static stream selection, streams are selected based onuser stream preferences. In dynamic stream selection, streams areselected based on user track preferences and content information. In amulti-channel system, these two modes could be practiced simultaneously,where some channels of the system operate with static stream selectionand other channels operate with dynamic stream selection. FIGS. 4a and4b show static and dynamic stream selection, respectively, according toembodiments of the invention.

FIG. 4a shows a method of an embodiment of the invention having staticstream selection. In step 402, a stream is selected based on user streampreferences. Such user stream preferences can specify a station, and/ora listening schedule (i.e., which stations to listen to at which times).Furthermore, in a multi-channel system, user stream preferences canspecify more than one station and/or more than one listening schedule tobe simultaneously monitored. Once a stream (or streams) is selected instep 402, then automated selection and recording 404 of tracks from theselected stream(s) is performed as discussed above. Static streamselection is usually practiced with batch recording, but can also bepracticed with triggered recording.

FIG. 4b shows a method of an embodiment of the invention having dynamicstream selection. In step 406, content information for several streamsis monitored. In step 408, at least one stream is selected based oncontent information and user track preferences. For example, a channelmay be abandoned or avoided for playing too many tracks which havealready been recorded or, conversely, the system may detect a channel onwhich many previously unrecorded tracks of interest are being played andchange to that channel. This kind of dynamic stream selection can bepracticed in connection with batch recording. Alternatively, dynamicstream selection in connection with triggered recording can bepracticed. For example, if user preferences indicate that songs byartist X are to be recorded, then a stream can be dynamically selectedbecause it is currently playing a song by artist X that has not beenpreviously recorded. Following step 408, automated selection andrecording 404 of tracks from the selected stream(s) is performed asdiscussed above. Dynamic stream selection is usually practiced withtriggered recording, but can also be practiced with batch recording.

The preceding description relates to methods of the invention, and alsoprovides various implementation details. Processors programmed toimplement methods of the invention are also embodiments of theinvention. Such embodiments can be stand-alone “set-top” boxes, or canbe general purpose computers (e.g., “living room PCs”) running softwareimplementing methods of the invention. Such processors can use anycombination of hardware and/or software to implement methods of theinvention. The invention can also be embodied as a set of computerinstructions recorded onto a computer-readable medium (e.g., an opticalor magnetic disk) for implementing methods of the invention.

In the preceding description, “recording” is to be understood in broadterms. Thus recording of a segment can be to a magnetic (or optical)storage medium, or recording of a segment can entail temporary storageof the segment in a processor (or computer) buffer. In some cases,segment durations can be 15 minutes or more, which is typically longenough to include several song tracks. Such long segments are desirablefor providing margin before and after track start times. For example, itis often preferred for the segment duration to exceed an estimatedmaximum track length by a margin of about 20 seconds.

Segment recording according to the present invention can be employedwith segments having adjustable duration. For example, a segmentrecorded to a magnetic disk drive medium (or stored in a processorbuffer) can be extended as more data is intermediate recorded from therelevant broadcast media stream. Such an adjustable segment can also bedecreased in length by processing its recorded information (e.g.,searching for desired tracks, and clipping and final recording thedesired tracks and discarding the undesired material as discussedabove). Once part of the segment has been processed, the processedfraction of the segment can be removed from the segment, therebydecreasing its duration.

Clipping of tracks according to automatically determined track start andend times can be performed in various ways. One approach, as consideredabove, is to clip the tracks at the estimated start and end times.However, in some cases it is preferable to provide a margin againsterror in clipping, by clipping before the estimated start time by astart time margin and clipping after the estimated end time by an endtime margin. For example, these start and end time margins can be about5-10 seconds. Provision of such margins leads to clipped tracks whichare unlikely to be clipped incorrectly such that part of the desiredtrack is lost during clipping.

Such clipping with margins can lead to a situation where two consecutivedesired tracks are clipped in such a way that they overlap within therecorded segment. For example, if two consecutive desired tracks areseparated by 5 seconds and 10 second clip margins as used, overlap ofthe clipped tracks will occur. Such overlapping clipping is most easilyperformed in “batch mode” recording as discussed above, where therecorded segment is readily available for overlapping clipping.

When clipping with margins is performed, it is preferred to provide thefinal recorded track in a media file format having user-adjustable startand end time information. More specifically, suppose the total durationof a clipped and final recorded track is T. Without loss of generality,this track can be regarded as extending from 0≦t≦T. Start and end timesT₁ and T₂ respectively are defined in the media file (e.g., in a header)such that playback of the track begins at t=T₁ and ends at t=T₂.Provision of user adjustable start and end times in the media fileformat permits a user to effectively fine-tune the track clipping asneeded or desired. For example, if the track starts at a time T_(a)>0,setting the media file start time T₁ to a value between 0 and T_(a)reduces the unwanted/irrelevant time at the beginning of track playbackas much as desired. Similarly, unwanted/irrelevant time at the end oftrack playback can also be adjusted in this manner. Since the parametersT₁ and T₂ are stored as part of the media file format, such adjustmentcan be performed once and be effective on all subsequent playbacks ofthe track.

As indicated above, content information can be provided in variousforms. In some cases, processing may be required to make existing formsof content information more suitable for use with the present invention.For example, content information is often provided as a video display oftext (e.g., showing song title, artist name, video director, albumtitle, record label and/or other information). Such a video display canbe on the same stream being recorded (e.g., a TV station broadcastingmusic and continually displaying content information, or a TV stationshowing music videos including content information for part of thebroadcast). Such a video display can also be on a stream other than astream being recorded (e.g., content information from a “TV guide”channel). In these cases, optical character recognition (OCR) techniquescan be used to extract the textual content information from the videodisplay. Such OCR techniques are well known in the art.

For example, automatic recording of music videos can be accomplishedaccording to the invention by use of OCR techniques to recognize orextract textual content information from video display contentinformation. Once such content information is obtained, it can be used,as described above, to automatically record, clip and meaningfully labeland organize desired music video tracks. In this manner, an organizedlibrary of recorded music videos can be automatically generated inaccordance with a user's preferences.

Extraction of textual content information be performed in various ways.For example, the electronic video signal itself can be electronicallyprocessed to extract textual content information (either as text or as a2-D pattern to be subjected to character recognition methods). Analternative method is to optically capture the video display (e.g., witha camera or other imaging device). Preferably, a digital camera isemployed for this purpose. The image provided by the imaging device canthen be processed to extract textual information from other parts of theimage, and to recognize characters within this textual information.

The preceding examples relate to broadcast media streams where contentinformation other than an advance schedule is available for use.However, embodiments of the invention are also applicable to clipping ofany media stream, whether or not it is regarded as being broadcast. Forexample, automatic clipping and meaningful labeling of securitysurveillance video streams would be highly desirable, and such videostreams are not usually thought of as being “broadcast”. For thisapplication, it will suffice to define a broadcast media stream as amedia stream which can be simultaneously received at two or moreseparated locations. A non-broadcast media stream is a media streamwhich cannot be simultaneously received at two or more separatedlocations.

For a non-broadcast media stream, the content information required forautomatic clipping and labeling can be derived by automatic analysis ofthe media stream itself, as described in greater detail below. Suchautomatic derivation of content information from the media stream itselfcan also be useful in connection with broadcast media streams. Forbrevity, it is convenient to refer to this process of deriving contentinformation from the stream itself (broadcast or non-broadcast) as“stream content derivation”.

Embodiments of the invention are applicable to automatic clipping andlabeling of broadcast and non-broadcast media streams, including but notlimited to: radio broadcasts, television broadcasts, web feeds,podcasting, Really Simple Syndication (RSS) feeds, audio surveillancefeeds, video surveillance feeds, audio/video surveillance feeds,streaming audio clips, streaming video clips, and streaming audio/videoclips. Embodiments of the invention are also applicable in cases ofre-broadcasting or re-transmission of a media stream. For example, abroadcast or surveillance feed can be recorded raw (i.e., without anyprocessing), and then clipping and labeling can be performed during playback of the raw recording. As a further example, raw video availablefrom the internet as a streaming media clip can be processed in thismanner to provide clipped and labeled tracks.

Stream content derivation can be performed in various ways. Someexamples will help illustrate the possibilities. One can analyzetelevision news streams for the name “Kevin Bacon” using existing speechanalysis methods. A user-defined methodology may then be employed tocreate a buffer both before and after this word to clip the segment. Onesimple methodology is clipping a fixed time (e.g., 5 seconds) before andafter the word is spoken. A more sophisticated methodology could lookfor pauses in speech or large changes in the video image before andafter the occurrence to trigger clipping. Given enough processing power,this analysis of a media stream could be looking for thousands ofmarkers like “Kevin Bacon” within the speech and clipping hundreds ofsegments accordingly on many channels at the same time.

An RSS feed or periodic Podcast from a video or audio blogging site maybe monitored for occurrences as above and clipped in a similar manner.Once again, where one cannot know in advance the schedule of what willbe discussed or shown, a content stream may be created from thestreaming media itself through various methods of content analysis. Thiscontent stream may then provide information for intelligently clipping,naming and filing the segments.

Many previously recorded streams for which there is no published“content schedule” would be analyzable using such methods as well. Forexample, decades of past C-Span broadcasts archived as digital media maybe “re-run” and analyzed for content markers such as “Nixon,” “Carter,”“Reagan,” “Vietnam,” “Bay of Pigs,” etc. Such content markers can beregarded as being “user track preferences” in the above describedmethods, because the user is effectively looking for tracks whichcontain instances of the specified content markers. This methodologywould allow these streaming media to create content information forintelligently clipping, naming and filing the segments.

Suitable methods for providing stream content derivation include voicerecognition, face recognition, object recognition, and imagerecognition. For example, object recognition could identify instances ofthe appearance of a specific object (e.g., make and model of a car) in amedia stream. Similarly, image recognition could identify instances ofthe appearance of a specific image (e.g., Mickey Mouse, written words,etc.) in a media stream. Voice recognition could identify particularspoken words and/or help identify the speaker. Face recognition couldidentify one or more persons appearing in an image. For example, acasino could use automatic clipping of video feeds based on facerecognition to expedite identification of persons of interest, such asfrequent customers, card counters, etc.

Note that stream content derivation is based on recognizing one or morecontent elements within the media stream, as opposed to simple patternrecognition of all or part of the media stream representation itself.For example, a method of identifying a musical track by taking awave-form sample of the track and looking for a match of the sample in acomprehensive database would not be stream content derivation, becauseno content elements within the musical track (e.g., words in the lyrics)are actually recognized in this approach.

As another example, stream content derivation could include recognizingmotion in a video feed. This capability can be valuable in securityapplications, where a video monitor may typically provide a staticimage, and motion in the video image can be recognized as a kind of“content” and used to cue clipping and forwarding for further analysis.For example, a video camera monitoring a commercial warehouse when thewarehouse is closed would ordinarily show a static video image with norelative motion of objects in the images. Using motion recognition tocue automatic clipping enables a reviewer to efficiently concentrate onvideo segments that may reveal unauthorized activity. This content basedapproach is in sharp contrast to more conventional approaches, such asactivating a video camera in response to detected motion by using amotion sensor to control the video camera.

The above detailed description is by way of example instead oflimitation. Thus the invention can be practiced with variousmodifications to the above embodiments. For example, the above examplesmainly relate to audio media, but the invention is also applicable tovideo and audio/video media. Also, digital media is considered in theabove examples, but the invention is applicable to both analog anddigital media.

1. An apparatus for automatically selecting, recording and labelingmedia tracks from a media stream, the apparatus comprising: at least oneprocessor; and at least one memory operatively coupled to at least oneof the at least one processor, the at least one memory havinginstructions stored thereon which, when executed by at least one of theat least one processor cause the at least one processor to; a) receiveuser track preferences; b) derive content information relating to saidmedia stream by analysis of said media stream, wherein said contentinformation is not a time schedule of tracks to be played in the future;c) select one or more tracks within said media stream in accordance withsaid content information and said track preferences; d) flag each ofsaid selected tracks in said media stream; e) record each of saidselected tracks to a recording medium based on the flag of said selectedtracks; and f) label each of said recorded tracks with a meaningfullabel derived from said content information; wherein said media streamis substantially continually received by an end-user while beingdelivered by a provider, and wherein said end-user has no control overcontent of the media stream.
 2. The apparatus of claim 1, wherein saidanalysis comprises recognizing one or more content elements of saidmedia stream.
 3. The apparatus of claim 2, wherein said content elementscomprise one or more elements selected from the group consisting of:spoken or written words, individuals, objects, and images.
 4. Theapparatus of claim 1, wherein said media stream is a broadcast mediastream.
 5. The apparatus of claim 1, wherein said media stream is anon-broadcast media stream.
 6. The apparatus of claim 1, wherein saidinstructions, when executed by at least one of the at least oneprocessor further cause the at least one processor to create anintermediate recording of at least one segment of the media stream andselect one or more tracks within said segment in accordance with saidcontent information and said track preferences based on the intermediaterecording.
 7. The apparatus of claim 1, wherein the flag for eachselected track is a start time and an end time of the correspondingselected track.
 8. A method for automatically selecting, recording andlabeling media tracks from a media stream, the method comprising: a)receiving user track preferences; b) deriving content informationrelating to said media stream by analysis of said media stream, whereinsaid content information is not a time schedule of tracks to be playedin the future; c) selecting one or more tracks within said media streamin accordance with said content information and said track preferences;d) flagging each of said selected tracks in said media stream; e)recording each of said selected tracks to a recording medium based onthe flag of said selected tracks; and f) labeling each of said recordedtracks with a meaningful label derived from said content information;wherein said media stream is substantially continually received by anend-user while being delivered by a provider, and wherein said end-userhas no control over content of the media stream.
 9. The method of claim8, wherein said analysis comprises recognizing one or more contentelements of said media stream.
 10. The method of claim 8, wherein saidcontent elements comprise one or more elements selected from the groupconsisting of: spoken or written words, individuals, objects, andimages.
 11. The method of claim 8, wherein said media stream is abroadcast media stream.
 12. The method of claim 8, wherein said mediastream is a non-broadcast media stream.
 13. The method of claim 8,wherein said instructions, when executed by at least one of the at leastone processor further cause the at least one processor to create anintermediate recording of at least one segment of the media stream andselect one or more tracks within said segment in accordance with saidcontent information and said track preferences based on the intermediaterecording.
 14. The method of claim 8, wherein the flag for each selectedtrack is a start time and an end time of the corresponding selectedtrack.